Back to index

Platform Engineering: A Guide for Technical, Product, and People Leaders

Authors: Camille Fournier, Ian Nowland, Camille Fournier, Ian Nowland

Overview

This book is a practical guide for technology leaders looking to build and manage effective internal platforms. It addresses the growing complexity in software development caused by the proliferation of cloud services and open-source components, arguing that platform engineering is the solution to this complexity crisis.

We argue that traditional approaches to managing shared infrastructure and tooling often fail due to a lack of customer focus, inflexibility, and instability. We introduce the concept of platform engineering as a discipline focused on building internal platforms as curated products that provide self-service capabilities, reduce operational overhead, and empower application development teams.

The book is structured around several key concepts. We introduce “the four pillars of platform engineering”: product, development, breadth, and operations. We then delve into practical strategies for building and operating platforms, including detailed advice on team composition, product management, navigating organizational politics, handling migrations, and fostering a culture of customer empathy. We address common challenges such as the ‘over-general swamp’ caused by excessive integration code (‘glue’), advocating for a product-led approach to platform development.

Throughout the book, we emphasize the importance of clear communication with stakeholders, careful planning, iterative development, and a focus on measurable impact. We draw on our own extensive experience leading platform teams at various organizations, supplementing our perspectives with insights from other industry experts.

The target audience is primarily technical, product, and people leaders within organizations that build and operate software platforms, but its insights are also relevant to broader technology leadership who may be struggling to understand and manage the complexity of their technology stack.

The book aims to provide a practical playbook for building platforms that are not just functional but are trusted, loved, and ultimately, drive engineering productivity and business value. In a rapidly evolving technological landscape, this book offers a valuable resource for navigating complexity and building a sustainable foundation for innovation and growth.

Book Outline

1. Why Platform Engineering Is Becoming Essential

Modern software development is increasingly complex due to the proliferation of cloud services and open-source components. Centralized teams created to manage this shared complexity often fail due to inflexibility, lack of customer focus, and instability. Platform engineering offers a better approach by building ‘platforms’: curated, internal products that provide self-service capabilities for application teams, reducing overall system complexity and increasing developer productivity. The key is to limit the number of underlying components (primitives) and the amount of ‘glue’ needed to connect them, enabling easier maintenance, faster iteration, and smoother migrations.

Key concept: The over-general swamp: This describes a common anti-pattern in software development where organizations accrue a large amount of ‘glue’ — custom integration code, scripts, and configurations — due to application teams independently choosing and connecting various cloud and open-source components. This creates a tangled, complex system that is difficult to maintain and evolve.

2. The Pillars of Platform Engineering

Effective platform engineering relies on four key pillars. First, a curated product approach that prioritizes user needs and offers tailored solutions. Second, software-based abstractions to encapsulate complexity and facilitate easier use and maintenance. Third, breadth of support for a wide range of developers and use cases. Finally, a focus on reliable operation of the platform as a foundational element of the business. These principles ensure platforms are built for and adopted by their intended users while reducing operational overhead and maximizing leverage for the organization.

Key concept: The four pillars of platform engineering: 1. Product: Adopt a curated product approach, building paved paths (for common use cases) and railways (for filling strategic gaps). 2. Development: Create software-based abstractions to manage complexity. 3. Breadth: Serve a broad base of application developers, providing self-service capabilities, guardrails, and multi-tenancy. 4. Operations: Operate platforms as reliable foundations for the business.

3. How and When to Get Started

At small scale, informal cooperation around shared code works well, but as teams and codebases grow, dedicated platform teams and more formal processes become necessary. When transitioning from a cooperative to a centralized model, expect some resistance and communication challenges. It is crucial to focus on solving current problems rather than adopting new technologies or architectures prematurely, and carefully curate your team composition to balance systems and software expertise.

Key concept: Dunbar’s number: The cognitive limit to the number of people with whom one can maintain stable social relationships, often cited as around 150. Beyond this number, more formal management and communication structures become necessary.

4. Building Great Platform Teams

Building a high-performing platform engineering team requires a diverse set of skills and perspectives. Hiring should focus on finding engineers who are not only strong software developers but also have an understanding of and interest in systems. It’s essential to hire for customer empathy and operational experience, even within software engineering roles, and build a culture that values these qualities. Recognize and reward engineers in all roles, even those whose contributions may not be immediately obvious.

Key concept: Platform engineering team roles: Effective platform teams require a blend of roles, including software engineers (system-focused developers comfortable with operations), systems engineers (DevOps generalists with broad systems knowledge), reliability engineers (focused on system reliability), and systems specialists (experts in specific areas like networking or storage).

5. Platform as a Product

Platforms should be treated as products, complete with product discovery, iterative development, customer support, and clear roadmaps. This requires balancing immediate needs with long-term vision and navigating the tension between building what seems fun versus what is strategically valuable. Effective platform product management helps achieve this balance and enables the platform to support innovation within its boundaries.

Key concept: Platform as a product: Treat your internal platform like an external product, focusing on customer needs, iterative development, and measurable impact. This requires a product mindset, ongoing customer engagement, and a roadmap that balances user needs with operational requirements.

6. Operating Platforms

Operating platforms as reliable foundations involves taking full ownership of operations, providing user support, and establishing a culture of operational discipline. A merged DevOps approach, where engineers share responsibility for both development and operations, is crucial for platform teams. Establish sustainable on-call practices and clear support processes to ensure your platform team can handle the operational load without burning out. User support becomes a critical aspect, forcing engineers to understand and empathize with customer challenges.

Key concept: Sustainable on-call load: Aim for fewer than five business-impacting pages per week for your on-call engineers. Prioritize platform stability over new features if your on-call load is too high. Merge DevOps/SRE responsibilities with development to create well-rounded platform engineers.

7. Planning and Delivery

For long-running platform projects, thorough planning and clear communication are essential. Use proposals and action plans to outline goals, dependencies, and metrics for success. Avoid overly relying on project managers early on, as this can hinder collaboration and lead to inaccurate estimates. Focus on incremental delivery and frequent communication with stakeholders to build trust and avoid the “long slog” where projects drag on indefinitely.

Key concept: Long-running project planning: For projects spanning months or years, create a clear proposal document outlining the problem, potential solutions, chosen solution and rationale, and a high-level plan. Break down projects into quarterly or monthly milestones, and consider the need for additional resources during testing, integration, and migration.

8. Rearchitecting Platforms

Rearchitecting, an iterative approach to system redesign, is often preferable to building a v2. Rearchitecting allows for incremental delivery of value, reduces migration costs, and aligns better with the existing platform culture. Focus on rearchitectures that enable high-value features or address critical system limitations, and secure leadership buy-in for long-term investment.

Key concept: Rearchitecting vs. v2: Rearchitecting, the iterative improvement of an existing system, is preferred to building a v2 (a complete replacement) as it reduces risk, limits scope, and allows for incremental value delivery. Prioritize rearchitectures that enable key features or address critical shortcomings, and plan for migrations.

9. Migrations and Sunsetting of Platforms

Migrations, the necessary evil of evolving platforms, are an opportunity to demonstrate platform value. Focus on making migrations as transparent and painless as possible for users. Invest in automation, backward compatibility, clear documentation, and proactive support to reduce customer effort and minimize disruption.

Key concept: Transparent migrations: Strive for migrations that are transparent to users. Achieve this through techniques like abstraction, automation, and backward compatibility. Minimize customer effort by providing clear documentation, tooling, and support for migrations.

10. Managing Stakeholder Relationships

Effectively managing stakeholder relationships is crucial for platform success. Use the power-interest grid to understand your stakeholders’ influence and tailor your communication accordingly. Be clear about the business impact of your work and avoid oversharing technical details. Seek compromises when possible, but be prepared to say ‘no’ or ‘not yet’ when necessary, while preserving the relationship.

Key concept: Power-interest grid: A stakeholder mapping tool used to categorize stakeholders based on their level of power within the organization and their level of interest in your work. This helps prioritize engagement strategies for different stakeholder groups.

11. Your Platforms Are Aligned

Team alignment is critical for avoiding duplicated efforts and conflicting priorities. Ensure all platform teams share a common purpose centered around the four pillars of platform engineering. Align product strategies and planning across platforms through independent product management, cross-platform architectural reviews, and open communication. Address misalignment through judicious restructuring and prioritization of efforts.

Key concept: Product-focused alignment: Ensure that all platform teams share a common purpose centered around the four pillars of platform engineering. Align product strategies across platforms through independent product management and cross-platform architectural review, and establish shared operational practices to foster a unified team culture.

12. Your Platforms Are Trusted

Earning and maintaining customer trust is essential for platform adoption and long-term success. Prioritize operational stability, provide high-quality support, and engage actively with your customers to demonstrate that you understand their challenges and are working to solve their problems. Secure technical and leadership buy-in for major investments before starting and communicate their value through demonstrable outcomes.

Key concept: Customer empathy and operational excellence: Build trust by prioritizing operational stability, providing excellent user support, and fostering a culture of customer empathy within your team. Secure technical stakeholder buy-in for large investments and demonstrate their value through measurable outcomes.

13. Your Platforms Manage Complexity

Effective platforms actively manage complexity, not just eliminate it. Be mindful of ‘accidental complexity,’ the additional complexity introduced by the platform itself through poor design, unclear documentation, or overreliance on manual processes. Embrace product discovery to identify the simplest solutions and prioritize user experience and operational stability.

Key concept: Accidental complexity of human coordination: Platforms aim to reduce complexity, but poorly designed platforms can introduce new complexity through manual workarounds, excessive documentation, and inter-team dependencies. Mitigate this by providing self-service tooling, clear APIs, and minimizing the need for human intervention.

14. Your Platforms Are Loved

‘Love’ is a valuable, though qualitative, indicator of platform success. Loved platforms ‘just work,’ reducing user friction and fostering a sense of delight. Avoid focusing on simplistic metrics like adoption or efficiency at the expense of user experience. Prioritize understanding and addressing user needs through customer collaboration and iterative product discovery.

Key concept: ‘Love’ as a success metric: Loved platforms are often simple, well-designed tools that solve a specific problem effectively. They ‘just work,’ reducing user friction and making tasks enjoyable. Strive for ‘love’ as a proxy for increased productivity, rather than focusing on simplistic metrics like adoption.

Essential Questions

1. What problem does platform engineering solve, and how?

Platform engineering addresses the increasing complexity in software systems by building internal platforms that reduce cognitive load and operational overhead for application developers. It’s about building reusable, self-service capabilities that allow application teams to focus on building features that deliver business value, rather than on undifferentiated infrastructure management or integration tasks. By abstracting away underlying complexities, platform teams enable application teams to move faster, iterate more quickly, and manage changes more easily, ultimately leading to greater productivity and faster delivery of business value. This leads to economies of scale and reduces the long-term cost of software by minimizing maintenance overhead. Building platforms may involve initial investment and cultural change, but the long-term benefits in terms of developer productivity and reduced operational costs outweigh the initial expenses.

2. What are the four pillars of platform engineering, and why are they important?

The four pillars of platform engineering are product, development, breadth, and operations. The product pillar emphasizes a curated product approach, focusing on user needs and tailoring solutions. Development centers on creating software-based abstractions that manage complexity and enable self-service capabilities. Breadth aims to serve a wide base of application developers through self-service interfaces, guardrails, and multi-tenancy. Operations ensures platforms are run as reliable foundations through proactive monitoring, support, and operational discipline. These pillars work together to create platforms that are not just functional, but also user-friendly, reliable, scalable, and aligned with business needs. Neglecting any of these pillars leads to suboptimal platform design and ultimately diminishes the value of the platform team.

3. What is the ‘over-general swamp’, how does it form, and how does platform engineering offer an escape?

The over-general swamp arises from the unchecked proliferation of choices in technology and a lack of governance, where teams select tools and components independently, often prioritizing short-term gains over long-term maintainability. This results in a tangled web of custom integrations (“glue”), making the system complex, fragile, and slow to evolve. Platform engineering provides an escape from this swamp by curating a limited set of technologies, abstracting them behind self-service platforms, and managing their integration and operation. By doing so, platform teams reduce the amount of custom glue needed, enabling faster iteration, smoother upgrades, and better management of the overall system complexity. A key takeaway here is the importance of centralized governance and standardization to manage the inherent complexity of a growing technology stack. This allows for economies of scale where maintenance and upgrades are managed for the entire organization instead of each team needing to develop their own solutions.

4. Why is a product mindset essential for platform engineering, and what does it entail?

It’s not just about throwing PMs at the problem. Platform engineering requires a significant cultural shift, especially from engineering teams accustomed to a project-based or purely technical mindset. It requires actively listening to application developers, understanding their needs, and building solutions tailored to those needs. It’s about building trust with these internal customers and recognizing that their priorities may not always align perfectly with the platform team’s. This also means measuring success through customer feedback and adoption, rather than just hitting internal metrics or completing projects. This cultural shift requires buy-in and reinforcement from leadership, changes in hiring and promotion processes to value customer empathy and operational experience, and consistent engagement with users. This book makes a case for developing the organizational maturity to execute well rather than just building what’s fun.

5. How do successful platform teams balance the need for stability with the need to support innovation and change?

Balancing stability and innovation within platforms is a core leadership challenge. Platforms should provide a stable foundation for the business, but must also adapt and evolve to support new technologies, changing business needs, and opportunities for efficiency, reliability, security, and performance improvements. To achieve this balance, platform engineering leaders must prioritize initiatives strategically, considering not only the technical value but also the customer impact, migration costs, and organizational alignment. This requires a deep understanding of customer needs, a willingness to compromise with stakeholders, and a disciplined approach to operations and change management. It often means being intentional about when to say ‘yes’ or ‘no’ to user requests.

1. What problem does platform engineering solve, and how?

Platform engineering addresses the increasing complexity in software systems by building internal platforms that reduce cognitive load and operational overhead for application developers. It’s about building reusable, self-service capabilities that allow application teams to focus on building features that deliver business value, rather than on undifferentiated infrastructure management or integration tasks. By abstracting away underlying complexities, platform teams enable application teams to move faster, iterate more quickly, and manage changes more easily, ultimately leading to greater productivity and faster delivery of business value. This leads to economies of scale and reduces the long-term cost of software by minimizing maintenance overhead. Building platforms may involve initial investment and cultural change, but the long-term benefits in terms of developer productivity and reduced operational costs outweigh the initial expenses.

2. What are the four pillars of platform engineering, and why are they important?

The four pillars of platform engineering are product, development, breadth, and operations. The product pillar emphasizes a curated product approach, focusing on user needs and tailoring solutions. Development centers on creating software-based abstractions that manage complexity and enable self-service capabilities. Breadth aims to serve a wide base of application developers through self-service interfaces, guardrails, and multi-tenancy. Operations ensures platforms are run as reliable foundations through proactive monitoring, support, and operational discipline. These pillars work together to create platforms that are not just functional, but also user-friendly, reliable, scalable, and aligned with business needs. Neglecting any of these pillars leads to suboptimal platform design and ultimately diminishes the value of the platform team.

3. What is the ‘over-general swamp’, how does it form, and how does platform engineering offer an escape?

The over-general swamp arises from the unchecked proliferation of choices in technology and a lack of governance, where teams select tools and components independently, often prioritizing short-term gains over long-term maintainability. This results in a tangled web of custom integrations (“glue”), making the system complex, fragile, and slow to evolve. Platform engineering provides an escape from this swamp by curating a limited set of technologies, abstracting them behind self-service platforms, and managing their integration and operation. By doing so, platform teams reduce the amount of custom glue needed, enabling faster iteration, smoother upgrades, and better management of the overall system complexity. A key takeaway here is the importance of centralized governance and standardization to manage the inherent complexity of a growing technology stack. This allows for economies of scale where maintenance and upgrades are managed for the entire organization instead of each team needing to develop their own solutions.

4. Why is a product mindset essential for platform engineering, and what does it entail?

It’s not just about throwing PMs at the problem. Platform engineering requires a significant cultural shift, especially from engineering teams accustomed to a project-based or purely technical mindset. It requires actively listening to application developers, understanding their needs, and building solutions tailored to those needs. It’s about building trust with these internal customers and recognizing that their priorities may not always align perfectly with the platform team’s. This also means measuring success through customer feedback and adoption, rather than just hitting internal metrics or completing projects. This cultural shift requires buy-in and reinforcement from leadership, changes in hiring and promotion processes to value customer empathy and operational experience, and consistent engagement with users. This book makes a case for developing the organizational maturity to execute well rather than just building what’s fun.

5. How do successful platform teams balance the need for stability with the need to support innovation and change?

Balancing stability and innovation within platforms is a core leadership challenge. Platforms should provide a stable foundation for the business, but must also adapt and evolve to support new technologies, changing business needs, and opportunities for efficiency, reliability, security, and performance improvements. To achieve this balance, platform engineering leaders must prioritize initiatives strategically, considering not only the technical value but also the customer impact, migration costs, and organizational alignment. This requires a deep understanding of customer needs, a willingness to compromise with stakeholders, and a disciplined approach to operations and change management. It often means being intentional about when to say ‘yes’ or ‘no’ to user requests.

Key Takeaways

1. Treat your platform as a product

By treating a platform as a product, platform teams can better understand and address the needs of their internal customers – the application development teams. A product mindset leads to better prioritization of features, more effective communication, and a focus on delivering value. This approach ensures that the platform evolves in a way that maximizes its impact on the organization by focusing on the problems that are most important to solve, and avoids building systems for the sake of systems or due to the latest trends.

Practical Application:

In an AI product engineering team, building a platform for managing machine learning models (MLOps) can greatly improve efficiency. Instead of each team reinventing the wheel for model training, deployment, and monitoring, the platform team can provide standardized tools and workflows, accelerating the delivery of AI-powered features. For example, the platform can automate model deployment to various environments, standardize model performance monitoring, and manage access control to sensitive data.

2. Focus on customer empathy

Engaging with your internal customers – the application development teams – is crucial for building trust and ensuring that your platform meets their needs. Actively listen to their concerns, provide timely support, and involve them in the planning and development process. By treating them as valued partners, you can foster a collaborative relationship that leads to better platform adoption and greater organizational success.

Practical Application:

When implementing a new deep learning framework, don’t just give it to your data scientists and ask them to figure it out. Partner with them during the initial rollout and integration phase, providing support, documentation, and training tailored to their needs. This not only helps them get up to speed quickly but also gathers valuable feedback about how the platform can be improved to better support their specific workflows and use cases. For instance, perhaps they identify a need for GPU scheduling that you did not consider.

3. Manage complexity proactively

Complexity management is an ongoing challenge for platform teams. Don’t just focus on providing quick fixes or enabling short-term user requests. Instead, understand the root cause of problems and design solutions that address the underlying complexities rather than moving them around. This may require a willingness to prioritize less glamorous improvements over new features and to say “no” to requests if they contribute to unmanageable complexity or distract from solving more important problems.

Practical Application:

When faced with a request to integrate a new vector database into your MLOps platform, don’t just throw engineers at the problem to implement it quickly. Take a step back to consider the underlying problem. Is there another way to address the need for storing and querying embeddings, perhaps by improving indexing on your existing search service, or by changing the way these embedding features are trained and deployed? Could you create an abstraction that allows both solutions, then measure to see which one performs better?

4. Plan for migrations

A common mistake is to underestimate the importance of migrations when planning major changes like re-architectures. Plan for transparent migrations that require minimal or no effort from your users, and build support for migrations into the platform design. This not only reduces disruption for your customers, but also frees up platform engineering capacity for more strategic work. Remember that sometimes migrations are not just changing where the work is done but also force changes in the code itself.

Practical Application:

When planning a re-architecture of your AI model serving platform to support large language models (LLMs), plan for staged rollout and iterative development. Start by identifying a small set of LLM use cases that can be supported with minimal changes to the existing architecture, gather feedback from early adopters, and then use that feedback to inform the next phase of development.

5. Manage stakeholder relationships effectively

Stakeholder management is essential for successful platform engineering, especially when dealing with competing demands from different application development teams. Use the power-interest grid to understand your stakeholders’ influence and prioritize your engagement strategies. Effective communication, including transparent sharing of wins and challenges, helps build trust and secure buy-in for platform initiatives.

Practical Application:

An AI product engineer in a leadership position should use the power-interest grid to understand and engage with their key stakeholders effectively. Identifying high-power, high-interest stakeholders (e.g., heads of product, engineering managers of critical projects) and managing those relationships closely is crucial. For example, scheduling regular 1:1s with these stakeholders to gather feedback about the AI platform and address any concerns builds trust and helps maintain alignment.

1. Treat your platform as a product

By treating a platform as a product, platform teams can better understand and address the needs of their internal customers – the application development teams. A product mindset leads to better prioritization of features, more effective communication, and a focus on delivering value. This approach ensures that the platform evolves in a way that maximizes its impact on the organization by focusing on the problems that are most important to solve, and avoids building systems for the sake of systems or due to the latest trends.

Practical Application:

In an AI product engineering team, building a platform for managing machine learning models (MLOps) can greatly improve efficiency. Instead of each team reinventing the wheel for model training, deployment, and monitoring, the platform team can provide standardized tools and workflows, accelerating the delivery of AI-powered features. For example, the platform can automate model deployment to various environments, standardize model performance monitoring, and manage access control to sensitive data.

2. Focus on customer empathy

Engaging with your internal customers – the application development teams – is crucial for building trust and ensuring that your platform meets their needs. Actively listen to their concerns, provide timely support, and involve them in the planning and development process. By treating them as valued partners, you can foster a collaborative relationship that leads to better platform adoption and greater organizational success.

Practical Application:

When implementing a new deep learning framework, don’t just give it to your data scientists and ask them to figure it out. Partner with them during the initial rollout and integration phase, providing support, documentation, and training tailored to their needs. This not only helps them get up to speed quickly but also gathers valuable feedback about how the platform can be improved to better support their specific workflows and use cases. For instance, perhaps they identify a need for GPU scheduling that you did not consider.

3. Manage complexity proactively

Complexity management is an ongoing challenge for platform teams. Don’t just focus on providing quick fixes or enabling short-term user requests. Instead, understand the root cause of problems and design solutions that address the underlying complexities rather than moving them around. This may require a willingness to prioritize less glamorous improvements over new features and to say “no” to requests if they contribute to unmanageable complexity or distract from solving more important problems.

Practical Application:

When faced with a request to integrate a new vector database into your MLOps platform, don’t just throw engineers at the problem to implement it quickly. Take a step back to consider the underlying problem. Is there another way to address the need for storing and querying embeddings, perhaps by improving indexing on your existing search service, or by changing the way these embedding features are trained and deployed? Could you create an abstraction that allows both solutions, then measure to see which one performs better?

4. Plan for migrations

A common mistake is to underestimate the importance of migrations when planning major changes like re-architectures. Plan for transparent migrations that require minimal or no effort from your users, and build support for migrations into the platform design. This not only reduces disruption for your customers, but also frees up platform engineering capacity for more strategic work. Remember that sometimes migrations are not just changing where the work is done but also force changes in the code itself.

Practical Application:

When planning a re-architecture of your AI model serving platform to support large language models (LLMs), plan for staged rollout and iterative development. Start by identifying a small set of LLM use cases that can be supported with minimal changes to the existing architecture, gather feedback from early adopters, and then use that feedback to inform the next phase of development.

5. Manage stakeholder relationships effectively

Stakeholder management is essential for successful platform engineering, especially when dealing with competing demands from different application development teams. Use the power-interest grid to understand your stakeholders’ influence and prioritize your engagement strategies. Effective communication, including transparent sharing of wins and challenges, helps build trust and secure buy-in for platform initiatives.

Practical Application:

An AI product engineer in a leadership position should use the power-interest grid to understand and engage with their key stakeholders effectively. Identifying high-power, high-interest stakeholders (e.g., heads of product, engineering managers of critical projects) and managing those relationships closely is crucial. For example, scheduling regular 1:1s with these stakeholders to gather feedback about the AI platform and address any concerns builds trust and helps maintain alignment.

Suggested Deep Dive

Chapter: Chapter 5: Platform as a Product

This chapter dives into the core philosophy of treating an internal platform as a product. For an AI product engineer, this is essential for building platforms that are not just technically sound, but also address the specific needs and workflows of AI/ML practitioners. Understanding concepts like ‘paved paths’ and ‘railways’, along with the importance of product discovery and iterative development, will enable AI product engineers to build platforms that are widely adopted and ultimately accelerate the delivery of AI-powered products and features.

Memorable Quotes

Why Platform Engineering Is Becoming Essential. 24

Over the past 25 years, software organizations have experienced a problem: what to do with all of the code, tools, and infrastructure that is shared among multiple teams?

The Over-General Swamp. 28

Rather than reducing maintenance overhead, the cloud and OSS have amplified this problem, because they provide an ever-growing layer of primitives: general-purpose building blocks that provide broad capabilities but are not integrated with one another.

Allowing Application Developers to Operate What They Develop. 46

No one loves being on call. But when teams are only on call for issues caused by their own applications, we have found that a surprising number are willing to take on operational responsibility.

Taking a Curated Product Approach. 56

By product approach, we mean getting out of a purely technical mindset and refocusing on what your customers need from your systems, and their experience using these systems.

Wrapping Up. 167

Lead with resilience, empathy, and vision, and you’ll transform skeptics into believers.

Why Platform Engineering Is Becoming Essential. 24

Over the past 25 years, software organizations have experienced a problem: what to do with all of the code, tools, and infrastructure that is shared among multiple teams?

The Over-General Swamp. 28

Rather than reducing maintenance overhead, the cloud and OSS have amplified this problem, because they provide an ever-growing layer of primitives: general-purpose building blocks that provide broad capabilities but are not integrated with one another.

Allowing Application Developers to Operate What They Develop. 46

No one loves being on call. But when teams are only on call for issues caused by their own applications, we have found that a surprising number are willing to take on operational responsibility.

Taking a Curated Product Approach. 56

By product approach, we mean getting out of a purely technical mindset and refocusing on what your customers need from your systems, and their experience using these systems.

Wrapping Up. 167

Lead with resilience, empathy, and vision, and you’ll transform skeptics into believers.

Comparative Analysis

Platform Engineering provides a comprehensive and practical guide to building and managing internal platforms, setting it apart from more theoretical or high-level works. Compared to “Team Topologies”, which focuses on team structures and interactions, Platform Engineering dives deeper into the operational and strategic aspects of platform building. It shares some common ground with books like “The Staff Engineer’s Path” in emphasizing customer focus and the importance of migrations, but it offers more prescriptive advice specifically tailored to platform teams. It distinguishes itself from SRE-focused books by arguing that not all platform teams require the same level of operational rigor as Google’s SRE teams, and by highlighting the trade-offs involved in pursuing different operational models. It offers a more nuanced approach compared to books advocating a simplistic ‘copy Google’s SRE practices’ approach, arguing that the level of investment should be based on specific business and technical needs.

Reflection

Platform Engineering offers a timely and much-needed perspective on managing the growing complexity of modern software systems. Its emphasis on a product-focused approach, customer empathy, and operational excellence resonates strongly with the current challenges faced by many organizations. However, some of the prescriptive advice, while well-intentioned, may not be universally applicable. The book’s focus on internal platforms within larger organizations may not fully address the needs of smaller companies or those with different organizational structures. Its advocacy for a merged DevOps approach might not be feasible for every platform team, especially at FAANG scale where dedicated SRE or operational teams are the norm. Further, there is a slight bias in emphasizing platform engineering as primarily about infrastructure and developer tooling, while some other types of platforms need a different approach. Its strengths lie in its clear articulation of the platform engineering philosophy, its practical advice, and its insights into the organizational and cultural aspects of successful platform building. Despite its minor limitations, Platform Engineering makes a valuable contribution to the field and provides a solid foundation for technology leaders looking to navigate the complexities of modern software development and build platforms that are not just functional but also loved and trusted by their users.

Flashcards

What are the four pillars of platform engineering?

Curated product approach, software-based abstractions, serving a broad base of application developers, and operating as foundations for the business.

What is a platform?

A foundation of self-service APIs, tools, services, knowledge, and support that enable autonomous application teams to deliver product features at a higher pace, with reduced coordination.

What is platform engineering?

The discipline of developing and operating platforms to manage system complexity and deliver leverage to the business.

What is the “over-general swamp”?

The situation where organizations accrue excessive custom integration code (“glue”) due to application teams independently choosing and connecting various cloud and open-source components.

What is a sustainable on-call load for a platform engineer?

Fewer than five business-impacting pages per week.

What’s the difference between rearchitecting and building a v2?

Rearchitecting is an iterative process of improving an existing system, while a v2 approach builds a new system to replace the old one.

What is the Power-Interest Grid?

A stakeholder mapping technique to categorize stakeholders based on their power and interest in your work.

What’s the core principle of platform as a product?

Building internal platforms as curated products with a focus on customer needs and self-service capabilities.

What are the four pillars of platform engineering?

Curated product approach, software-based abstractions, serving a broad base of application developers, and operating as foundations for the business.

What is a platform?

A foundation of self-service APIs, tools, services, knowledge, and support that enable autonomous application teams to deliver product features at a higher pace, with reduced coordination.

What is platform engineering?

The discipline of developing and operating platforms to manage system complexity and deliver leverage to the business.

What is the “over-general swamp”?

The situation where organizations accrue excessive custom integration code (“glue”) due to application teams independently choosing and connecting various cloud and open-source components.

What is a sustainable on-call load for a platform engineer?

Fewer than five business-impacting pages per week.

What’s the difference between rearchitecting and building a v2?

Rearchitecting is an iterative process of improving an existing system, while a v2 approach builds a new system to replace the old one.

What is the Power-Interest Grid?

A stakeholder mapping technique to categorize stakeholders based on their power and interest in your work.

What’s the core principle of platform as a product?

Building internal platforms as curated products with a focus on customer needs and self-service capabilities.